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The Prisoner's Dilemma (PD) game is used in several fields due to the emergence of cooperation 
among selfish players. Here, we have considered a one- dimensional lattice, where each cell represents 
a player, that can cooperate or defect. This one-dimensional geometry allows us to retrieve the 
results obtained for regular lattices and to keep track of the system spatio-temporal evolution. 
Players play PD with their neighbors and update their state using the Pavlovian Evolutionary 
Strategy. If the players receive a positive payoff greater than an aspiration level, they keep their 
states and switch them, otherwise. We obtain analitycally the critical temptation values, we present 
the cluster patterns that emerge from the players local interaction and we perform an exploration 
of paramater space. The numerical results are in accordance to the critical temptation analitycal 
results, it confirms that the Pavlovian strategy foment the cooperation among the players and avoid 
the defection. The system also presented a new phase in the steady state, the quasi-iegvld.! phase, 
where several players switch their states during round to round, but the proportion of cooperators 
does not alter significantly. 



Keywords: Game Theory, Prisoner's Dilemma, Pavlovian Evolutionary Strategy, 
Emergence of Cooperation, Critical temptation. Phase transition. Cellular Automata. 
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I. INTRODUCTION 

Prisoner's Dilemma (PD) is a game where two players 
confront themselves and each one can either cooperate or 
defect. Players receive a payoff R (reward) in the case 
of mutual cooperation and a payoff P (punishment) if 
they are both defectors. If one player cooperates and the 
other defects, they receive S (sucker) and T (tempta- 
tion), respectively. These payoff values must satisfy the 
inequalities T > R > P > S and T ^ S < 2R to create 
a dilemma In a single round game the best choice 
is the defection, since it assures a higher payoff than co- 
operation, independently of the opponent decision (Nash 
equilibrium). However, a local minimum occurs under 
mutual defection, generating the dilemma. 

When the PD is played repeatedly, it is called Iterated 
Prisoner Dilemma (IPD). In the computer tournament, 
proposed by Axelrod [1, 2] to compare different strategies 
playing IPD, a simple strategy, with only one time step 
memory, called tit-for-tat (TFT), was by far the most 
stable one. The player using TFT cooperates in the first 
round and subsequently copies the opponent last round 
action. The dilemma and the cooperation, as a profitable 
behavior among selfish agents, make the PD the most 
prominent game in the Game Theory- It is used to model 
problems in several research fields [j, 0, H, @] • 

Here, we consider the IPD, but now each player is a 



cell of a one-dimesional automaton and can play with 
z neighbors. This geometry is equivalent to the player 
in regular lattices with z neighbors. In a non stochastic 
IPD game, during time evolution, players interact ac- 
cording to deterministic rules. All players play against 
their respective neighbors and update their states. This 
process is called round and it is the system time unit. 
After long enough, the system may reach a steady state, 
where the asymptotic cooperators proportion, poo, be- 
comes time independent. The player state update process 
varies according to the adopted evolutionary strategy 0] , 
namely: Darwinian Evolutionary Strategy (DES) Q or 
Pavlovian Evolutionary Strategy (PES) [9], that are con- 
sidered here. 

In the DES, the update process uses the strategy of 
copying the best adapted player behavior (fittest player), 
also known as the "survival of the fittest" . This is equiv- 
alent to the natural selection principle of Darwin p^ . 
The fittest player is the one who receives the greatest 
payoff. Each player compares his/her own payoff to the 
neighboring ones, and then copy the state of the fittest 
neighbor. 

For the PES, let us consider the following learning tech- 
niques. Win-stay, lose-shift (WSLS) is a general learning 
method used for iterated decision problems of all kinds. 
It was proposed by Thorndike (1911) p^l], assuming that 
actions, which yield satisfaction, will be reinforced and 
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actions, which yield discomfort, wih be weakened. This 
strategy is also called Pavlov. Kraines and Kraines [l2[ 
use positive and negative reinforcement to teach an indi- 
vidual to respond. In the PD context, an individual is a 
player. For example, in the first round, a player chooses 
randomly the action C or D (to cooperate or to defect). 
He/she plays the game and evaluates the outcome. If 
he/she receives a reward due to action C, this individu- 
als will be more prone to keep the action C. Otherwise, 
if he/she is punished due to the action C, then it will 
be more probable that the individual changes his/her ac- 
tion to D. This process can be thought as the strategy 
^' never change a winning team". If it is desirable that 
an individual acts like C, then he/she must be rewarded, 
or punished, repeatedly according to the player choice to 
reinforce the action C. In the PD, under these proposi- 
tons, a player keeps a given action when he/she receives 
a payoff R oi T and switches if his/her payoff is S or 
P. Namely, a player keeps his/her action when playing 
against a cooperator and switches it when he/she con- 
fronts a defector. 

Another possible way to use the Pavlov principle is 
to set an aspiration level (AL) to the IPD player [l3| . 
The payoff can be lower, equal or greater than the AL. If 
they receive a payoff higher than AL, they do not change 
their states and switch them, otherwise. In the PES, in 
general, all players have the same aspiration level. 

Pavlov based strategy is very robust in situations such 
as: presence of noise, i.e. a player can switch his/her 
state at any moment, with probability p > 0, regardless 
the adopted strategy by this player (mutatio n) fl^ ; play- 
ing against deceiving or profiteers strategies 1151 : compe- 
tition for surviving, in coevolutionary games |16l,fl7|. Its 
important features are: it does not forgive a defection; it 
exploits altruistic strategies while it is not punished with 
a defection; it can correct occasional mistakes (noisy envi- 
ronment), this does not happen with the tit-for-tat strat- 
egy, for instance. Nevertheless, if the Pavlovian strategy 
is used as WSLS, with an aspiration level, it presents a 
weakness: it can be exploited by defective strategies. It 
seems contradictory to be robust against profiters strate- 
gies and yet allow to be exploited by defective strategies. 
It happens because a given player is concerned only with 
his/her own payoff and does not care about the opponent 
payoff. 

The main variable of the PD is the temptation. The PD 
order parameter is the proportion of cooperators. When 
the system evolves, it passes through a transient regime 
and eventually reaches a steady state, which defines the 
phase of the system. If the payoff values are kept con- 
stant and only T is varied, the critical temptation values 
appear. Critical temptation is a temptation value that 
yields a total payoff to the players, which force they to 
switch their states, generating a phase transition. Crit- 
ical temptation values depend on the adopted strategy, 
on the system conectivity and on the neighborhood con- 
figuration. 

In this paper, we present and solve analytically the 



critical temptation for the IPD in one-dimensional cel- 
lular automaton with a variable number of interacting 
neighbors for Pavlovian Evolutionary Strategy. The one- 
dimensional geometry allows us to keep track of time 
evolution (history in a static bidimensional image) and 
the steady quantitative results obtained are similar to 
those of square-lattices [8] . In Section [III we introduce 
the model. In Section Hill we derive analytically the crit- 
ical temptation values for the PES. In Section HVl we 
present the ^^a^i-regular phase, which is a new phase 
that emerged from our numerical results. We also present 
the cluster patterns that emerge during time evolution 
and the exploration of parameter space (temptation to 
defect, T, and initial cooperators proportion, po) for 
some connectivity values, z. Final remarks are presented 
in Section [Vl The pattern formation given rise to the 
quasi-T eguldiT phase are presented in more details in the 
Appendix [Al 

II. THE ONE-DIMENSIONAL MODEL 

Consider a one-dimensional cellular automaton with L 
cells, where each cell represents a player. Each player 
has two possible states: ^ = (defector) or = 1 (coop- 
erator) (see Fig. [1]). The automaton has no empty cells, 
so that pc{t) + pd{t) = 1, with pc{t) = (l/L) ei{t), 
where pcit) is the cooperators proportion at time t, and 
pd{t) is the defectors proportion. The initial cooperators 
proportion, < pc(0) = po ^ 1, is one of the problem 
parameters. The position of every Lpo cooperators in 
the automaton is set randomly from a uniform deviate. 
The initial configuration is the only stochasticity in the 
model. 




FIG. 1: Cellular automaton in the one-dimensional lattice 
with L = 11 players and open boundary condition. Blue cell 
(dark gray): cooperator, red one (light gray): defector. 

Consider the z-th player, his neighborhood (or connec- 
tivity) is given by 2; = {1, 2, . . . , L}. If z is even, there 
are a = z/2 adjacent players to the right-hand side and 
another a = z/2 to left-hand side (see Fig. [2a|) . If z is 
odd, each side has a = {z — l)/2 interactive players and 
player i interacts with his/her own state {self -interaction) 
(see Fig.[2bl) [ii,^!,!!^. Nowak and May [l^ argue that 
self-interaction makes sense, for example, if several ani- 
mals (a family) or molecules can occupy a single cell. The 
self -interaction is considered an mtra-group interaction. 

Using the one-dimensional topology, it is possible to 
vary the lattice connectivity z to any integer value in 
the range 1 < z < L. This is not possible, for in- 
stance, in a square lattice, because it is limited to von 
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FIG. 2: Cellular automaton in the one- dimensional lattice 
with L — 11 players. The central player (arrow origin) 
interacts with neighbors indicated by arrows, (a) z = 8 

(without self-interaction), (b) z — ^ (with self-interaction). 



Neumman {z = 4, see Fig. [3aj) or Moore {z = 8, see 
Fig. [3b|) neighborhoods. In a square lattice, if z is dif- 
ferent from z = {4; 8; 24}, the neighborhood is asym- 
metric. For instance, to obtain z = 6, one must con- 
sider the honeycomb lattice. Since the critical tempta- 
tion values depend only on the coordination number, this 
neighborhood may be considered in a one-dimensional 
lattice, where z = {4; 5} corresponds to the von Neum- 
man neighborhood, z = {8; 9} matches the Moore one 
and z = {6; 7} the honeycomb case, with and with- 
out self-interaction, respectively. We have used periodic 
boundary conditions (PBC), every player has the same 
connectivity. Once the lattice is one-dimensional, the 
boundary effect is smaller than observed in d dimensional 
lattices jli,|20|. 



III. ANALYTICAL CALCULATION OF 
CRITICAL TEMPTATION 

To present the Pavlov critical temptation calculation, 
we first briefiy review the results obtained by Duran and 
Mulet [111 for the Darwinian strategy. For the payoff 
evaluation, consider the parameters: T, i?, P and S. 
Consider two players i and j playing PD in a cellular 
automaton. The player i payoff with respect to player j 
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FIG. 3: Neighborhood representation in square lattice: (a) 

von Neumman, z — 4 and (b) Moore, z — 8. And their 
representations in the one-dimensional lattice to: (c) z = 4 
and (d) z = 8. Black: central player; dark gray: first 
neighbors; and light gray: seconds neighbors. Remember 
that for even z there is no self-interaction. 



noteworthy that, only if z is odd, there is an extra pay- 
off component gi^i, due to self-interaction. From Eq. [H 
the player i payoffs due to the interaction with a single 
defector {Oj = 0) and a single cooperator {6j = 1), are: 



■ 0,) + SO, 



if Oj = 0, 
1. 



(2) 



The player i payoff, due to interactions with Ci coop- 
erators within the z neie; hbors, is: Gf^^((9i) = [T(l - 



Oi) + ROi]ci, and with di defectors: ^^((9^) = [P(l - 
Oi) -h SOi]di. The payoffs sum, due the interactions with 
all the z neighbors, leads to the i-th player total pay- 
off: G,{ei) = [T(i - Oi) + Re,]c, + [P(i - Oi) + se,]di. 

Since the number of cooperators and defectors in a given 
neighborhood are complementary, d = z — c: 

G,{0,) = Tc,^P{z-c,)^ 

[{R-T)c,^{S-P){z-c,)]0,. (3) 

Therefore, the player i total payoff, is given by: 




if 0i = 0, 
if Oi. = 1. 



(4) 



In the following, we determine the payoffs for the DES 
and PES and the critical temptation values, which de- 
pend on the adopted strategy. 



p[{i-o,){i-o,)]^s[{i-oM (1) 

where Ok, is the player k state, with k = {1; 2;...; L}. 
The total i-th player payoff is: Gi = Yl]=i9i,3' is 



1. Darwinian Evolutionary Strategy (DES) 

Nowak and May [8] used the parameters set = 1, 
P = 5* = 0, leaving only one free parameter, the temp- 
tation 1 < T < 2, that ensures the conflict conditions. 
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These values are diflFerent from those originahy defined 
by Tucker [22] (T = 5, = 3, P = 1, 5' = 0). The condi- 
tions T > R> P > S and T^S <2R have been relaxed 
(P = S'; for T = 1, T = and for T = 2, T + 5 = 2R) 
without any harm to the DP confiict features. This modi- 
fication is known as the Weak Prisoner Dilemma. Placing 
the values adopted by Nowak and May for these parame- 
ters in Eq. [U the payoff becomes: gi^j = T{l—0i)0j-\-0i0j. 
A similar result has been obtained by Duran and Mulct 
[2l|: Qij = T(l - OiOj)Oj + OiOj. The difference between 
our result and the Duran and Mulct one is the presence 
of Oj multiplying Oi inside the parentesis, which it is un- 
necessary and, in this case, it does not alter the result. 

However, we notice that, if the players state can as- 
sume rational values, Duran and Mulct result is not 
valid. This situation occurs in the Continuous Prisoner's 
Dilemma (CPD) [ll, [13, 111, where a player has a co- 
operation level (CL), with < CL < 1, instead of only 
defecting or cooperating. For the CPD our results give 
the correct payoff values, considering the linear interpo- 
lation for intermediate values. 

Considering R = 1, P = S = in Eq. [3l we have 
Oi{0i) = [T - {T - l)ei]ci. Notice that: (i) the pay- 
off for a cooperator who plays with Ci cooperators is 
G'f^\Oi = 1) = q; while, (ii) a defector who plays with 
Ci cooperators has a payoff equal to Gl^'\Oi = 0) = c^T. 
For T > 1: (i) gI''\0, = 0) > gI''\0, = 1); and (ii) 

Gf^(l9) > "^^((9). In DES, the payoff of each player is 
always non- negative, Gi > 0. After all players calculate 
their payoffs, they update their states. During this pro- 
cess, each player i compares his/her payoff Gi with G/^, 
where Gk is the payoff of his/her k-th neighbor, with 
/c = {1; 2; . . . ; z}. If Gi < Gk and Gk = max[G G z], 
player i replicates the player k state, otherwise he/she 
maintains his/her current state. 

The system evolves till it eventually reaches the steady 
state, where the cooperators proportion poo is station- 
ary. The Poo phase transitions occur when the temp- 
tation value passes through critical values Tc. In the 
confiict region, 1 < T < 2, these transitions have been 
calculated [21]: Tc{n^m) = {z — n)/{z — n — m)^ where 
< n < z and 1 < m < int[(z — n — l)/2] are 
integers^. For example, for z = 8, these values are 
Te = (8/7, 8/6, 8/5, 8/4). 



2. Pavlovian Evolutionary Strategy (PES) 



—R and S = —T, which are placed in Eq. H) 




if Oi = 0, 
iiOi = 1. 



(5) 



For system using PES, each player payoff can be either 
positive or negative in the range: —zT < Gi < zT (Eq.[5] 
extreme cases are: q = and q = z). 

Each player i evaluates his/her payoff Gi. If the payoff 
is greater than the aspiration level {Gi > AL^ with AL = 
0), the player maintains his/her current state, otherwise, 
he/she switches the current state. We have defined the 
aspiration level as a null payoff, but any other value can 
be choosen. 

For player i to switch his/her state, it is necessary 
that his/her payoff be null or negative, that is: Gi{Oi) < 
0. Applying this condition to the null payoff situation 
{GiiOi) = 0) in Eq. El one has: 



G,(^0 




Ci)R<0 if Oi 
Ci)T < ifOi 



0, 
1. 



(6) 



For a defector to maintain his/her current state, T 
must provides a null gain: qTc — {z — Ci)R = 0, which 
leads to critical temptation value: Tc = [{z — Ci)/ci]R, 
and in a cooperator case, the null payoff occurs when 
CiR -{z- Ci)Tc = and = [ci/{z - Ci)]R. These two 
cases can be written by a simple equation: 



R. 



(7) 



The relevant variable is [{z — Ci) / Ci]^% which strongly 
contrasts to the DES one: {z — n)/{z — n — m). How- 
ever, notice that, as the DES, it does not depend on the 
configuration of the Ci cooperators within the z neigh- 
bors, it depends only on the the Ci and z values. For this 
reason, we can use the one-dimensional geometry in the 
following. 

An interesting feature observed for Tc in PES is its 
dependence on the player state. Critical temptation val- 
ues are the same for defectors and cooperators, but they 
appear in reverse order. For example, consider a coop- 
erator playing against z = 4 neighbors, if in the neigh- 
borhood there is no cooperator, Tc(4,0) = 0, one co- 
operator, Tc(4, 1) = l/3i?, and so on, then Tc(4,Ci) = 
{0; 1/3R- 1/2R', 3R; oo}, for a = {1, 2, 3, 4}. Now 
consider a defector in the same situation, Tc(4,q) = 
{oo; 3R; 1/2R; 1/3R; 0}, for a = {1, 2, 3, 4}. 



Following the same reasoning line as used for the DES, 
we present for the first time the critical temptation values 
calculation for the PES. The parameters used are P = 



IV. NUMERICAL RESULTS: EMERGENCE OF 
THE NEW QUASI-KEGVLAK PHASE 



^ For X positive, the function int(a::) gives the largest integer less 
than or equal to x. 



We have written a numerical code to simulate the PD 
adopting the PES in one-dimensional cellular automaton. 
The parameters are: L = 1,000 cells, with Lpo being 
the number of cooperators and the remaining ones the 
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number of defectors. To be statistically meaningful, the 
asymptotic proportion of cooperators, p^^^ are averages 
obtained for 1, 000 realizations. The quantity T varies in 
steps AT = 0.01 in the range 1 < T < 2 and po varies 
in steps Apo = 0.1 in the range < po < 1- The spatio- 
temporal patterns yielded by the cooperative/defective 
clusters are presented using smaller systems than the 
used to calculate poo- 

Despite the equivalence with d dimensional lattices in 
the critical temptation values determination, the one- 
dimensional case has several advantages |19|, 120|: it is 
easier to explain the cooperative/defective clusters in- 
vasion process and also the poo oscillations during the 
steady regime as observed in the Nowak and May pio- 
neer work [8, 17]. In addition to these phenomena expla- 
nation, it is also possible to save the system history in 
a single static image (see the spatio-temporal patterns). 
This is impossible to perform in a two-dimensional sys- 
tem, for example, where it is necessary a movie to observe 
the time evolution. 

In the steady state, the system can reach the coopera- 
tive, chaotic or defective phases, when adopting the DES 
and the cooperative or quasi-iegu\dii phases (which was 
not characterized before), adopting the PES. The coop- 
erative phase is characterized by the majority of players 
being cooperators. If the majority of players are defec- 
tors, the system is in the defective phase. These two 
phases are not sensible to the initial configuration. In 
these cases the fluctuations of poo (standard deviation 
- SD) is almost null. In contrast, the chaotic phase is 
highly sensitivity to small changes in the initial configu- 
ration (larger poo fluctuations - SD ~ 0.5). 

In the quasi-ieguldiT phase, poo oscillates a little around 
poo ~ 0.5, however there is a very large number of play- 
ers, who switch their states. However, these switching 
balance themselves. And the system is not sensible to the 
initial configuration presenting a small SD^ with SD ~ 
over almost all the parameter space. 

After the transient regime, the system reaches the 
steady state with the asymptotic poo- Adopting the PES, 
the system can present only the cooperative or quasi- 
regular phases. The defective and chaotic phases are ab- 
sent with the PES. The defective phase does not occur 
because a defective cluster yields negative payoff to its 
members. Thus, they change their states when this hap- 
pens. The chaotic phase absence is confirmed by the 
small standard deviation, SD ~ all over the parameter 
space. 

In the following we show the phase-diagram of systems 
adopting the PES, where are present the cooperative and 
quasi-iegnXd.! phases and the patterns that emerge dur- 
ing the transient time and the ones which persist in the 
steady state. The patterns are a visual way to under- 
stand the phases. 



A. Transient and Steady Regimes: exploration of 
the parameter space 

To depict the asymptotic cooperators proportion in 
the steady state, we have used surfaces to show poo as 
a function of T and po- Figs. [4] and [5] display these poo 
surfaces and their standard deviation for even (without 
self-interaction) and odd z values (with self-interaction), 
respectively. These phase-diagram present abrupt varia- 
tions when the system passes through Tc, and, eventually, 
it may go from cooperative to gi^a^z-regular phase. 




(a) Poo for z = 2. (b) poo standard deviation for 

z = 2. 




(c) poo for z = 8 (d) Poo standard deviation for 

z = 8. 




T ^° Po T ^° Po 

(e) Poo for z = 30 (f) poo standard deviation for 



z = 30. 

FIG. 4: poo surface as function of the temptation value, 
T, and the initial cooperators proportion, po, for 
z = {2; 8; 30}. 

An interesting aspect is the poo symmetry with respect 
to the po = 1/2, that is, Poo(po = 1/2 - </>) = Poo{po = 
l/2 + (/)) with < (/) < 1/2. The self-interaction presence 
or absence, changes Tc values and may change the phase 
(cooperative or quasi-TeguldLi) for the same region in the 
parameter space. Notice that there are few regions where 
the standard deviation SD is significant. 

In the PES, if all players are cooperators {po = 1), they 
always receive a positive payoff, and no player changes 
his/her state, therefore poo{T; po = 1; z) = 1. Oth- 
erwise, if all players are defectors {po = 0), in the first 
round, all players receive a negative payoff, and all of 
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(b) Poo standard deviation for 
z = 3. 



T ^ " Po 

(c) Poo for z = 9 



(d) Poo standard deviation for 
z = 9. 




(e) Poo for z = 29 



(f) Poo standard deviation for 
z = 29. 



FIG. 5: Poo surface as function of the temptation value, 
T, and the initial cooperators temptation, po, foi" 
z = {3; 9; 29}. 



them switch their states, returning to the previous sit- 
uation, consequently, poo{T;po = 0; = 1. Thus, the 
Poo symmetry around po = 1/2 is a consequence of the 
PES. For Po = rN players receive a positive payoff and 
(1 — r)N players a negative payoff, whether, po = 1 — 
(1 — r)N players receive positive payoff and rN players 
negative payoff, generating the symmetry. 

The surface projection pooiT^ po^ z) at plane pooT 
shows poo as function of T for different po values. In Fig- 
ure [H one sees the plots for some even and odd z values. 
The transitions in poo can be seen when the parameter T 
passes through critical temptation thresholds, Tc, given 
by Eq. In the plots, the Tc values marked by dashed 
vertical lines. For example, in Fig. [6c] = 8 - without 
self-interaction) Tc = 5/3. Meanwhile, in Fig. [6dl = 9 
- with self-interaction) Tc = {5/4; 2}. 

The results show that the cooperative phase is more 
prominent than quasi-iegu\dii one. Increasing the z val- 
ues, the quantity of Tc values raises. When the system 
goes through Tc, poo varies. The non dependence on a 
group, provides more liberty to each player seek the best 
outcome that satisfy his/her own aspiration level. In this 
case, the worst result is the homogeneity among the play- 



1.2 1.4 i.e 



(a) z = 2 



1 1.2 1.4 1.6 1.£ 



(b) Z = 3 




(c) z = 8 



(d) z = 9 




1 1.2 1.4 



(e) z = 28 



(f ) z = 29 



FIG. 6: Asymptotic cooper ator proportion {poo) as 
function of temptation to defect T, for 
z = {2] 3; 8; 9; 29; 30}. The vertical dashed lines sign 
Tc in plots and these values are given by Eq. Blue: 
po = 0.5; green: po = 0.6; red: po = 0.7; cyan: po = 0.8; 
magenta: po = 0.9. 



ers payoff {quasi-ieguldiT phase), instead of cooperation, 
the best result, but much better than the defection. 



B. Spatio-temporal patterns 

The patterns formed by cooperative/defective clusters 
that emerge in the system are yielded from local interac- 
tions among players. When the players adopt the DES, 
the differences between the border players payoff are fun- 
damental to determine the system dynamics [l9|. While 
in the PES, these border payoff differences are not as 
crucial as in the DES. For a more detailed explanation 
of patterns formation, see the Appendix [Al The spatio- 
temporal representation show us that these clusters can 
form fingers and gliders and they can interact. 

One may notice that clusters composed exclusively by 
cooperators sustain itself by the maintenance of cooper- 
ation among them. However, cooperation remains only 
when the size of the cooperative cluster is large enough to 
maintain a positive payoff to its members. The members 
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placed in the borders are exploited by defectors, but they 
do not switch their states because their payoffs are posi- 
tive, despite they are lower than the payoff of exploiters 
and inner players of the cooperative cluster. 

If a defective cluster is large enough to produce nega- 
tive payoffs to their members, it will not be stable. Thus, 
players with negative payoffs, will switch their states im- 
mediately. Thereby, Tragedy of Commons^ does not oc- 
cur, because the players negative payoff does not per- 
sist for more than one round, such as for players adopt- 
ing the DES. In this way, the population mean payoff 
is higher adopting the PES than the DES, where the 
Tragedy of Commons takes longer to vanish (when it 
vanishes) [H, [2^. In the PES, a player uses his/her 
own payoff to decide whether he/she will switch his/her 
state or not. It is an individual decision based on the 
personal aspiration level. Therefore, collective features, 
i.e. players switching their states due the environment 
(group) where they are inserted, may not occur. 

Different neighborhood configurations may generate 
positive or negative payoffs for the players. It is sim- 
ple to calculate the maximum defective cluster size and 
the minimum cooperative cluster size, which can remain 
together during the system evolution (stable clusters). 
In the cooperative cluster case, player i does not switch 
his/her state, if there are at least Cmin cooperators in 
his/her neighborhood. This guarantees a positive payoff 
G-™(6>i = 1) > 0, thus, from Eq. [3] one has: 

The situation is reversed in the defective cluster case. In 
this case, the player i must have in his/her neighborhood 
a maximum of dmax defectors, so that G^'^'^''{Oi = 0) > 0, 
and: 

< - {yTrIt) ■ 

A finger is a cluster that extends itself along a straight 
line during time evolution. It can be simple (flat one), 
or complex (composed by regular oscillations, like a saw- 
tooth, for example). The finger interior can be composed 
by cooperators/defectors or by intricated combinations of 
cooperators and defectors. It may present symmetry with 
respect to central player of the pattern and periodicity in 
the player states. A glider is a cluster that extends itself 
diagonally, and it has the same features of the fingers. 

In short, defective fingers may be composed of at most 
dmax defectors, and cooperative ones of at least Cmin coop- 
erators. For instance, for z = 2, fingers formed by up to 



two players are always smooth and continuous (see simple 
and complex fingers in the Appendix [Xj) . In general, the 
stable clusters are the cooperative ones (with at least Cmin 
cooperators) and small defective ones. Defective clusters 
that are greater than (imax destabilize themselves rapidly 
in few rounds. 

The transient regime is the necessary time to cease the 
patterns interactions or stabilize the patterns propaga- 
tion and it varies depending on the parameters set used 
(see Fig. [7|). Other possibility is the emergence of the 
quasi-T eguldiT phase, which is stationary, but there is a 
very large number of players who switch their states, but 
do not emerge fingers or gliders, and poo ~ 0.5. 

The intersection among cluster patterns generates very 
interesting structures. For example. Fig. [71 illustrates the 
presence of gliders^ that interact among themselves and 
with fingers. These interactions can generate either sim- 
ple (Fig. I7ati7dp or complex (Fig. [7c|) fingers. 



Player Player 
100 200 300 400 500 100 200 300 400 500 




(c) (d) 

FIG. 7: Illustration of intersections of glider with 
fingers and others glider. The parameters of these 
simulations are: L = 500, t = 300 and (a) T = 1.40, 
po = 0.7, z = 14 (without self-interaction); (b) T = 1.70, 
po = 0.3, z = 26 (without self-interaction); (c) T = 1.40, 
Po = 0.5, z = 12 (without self-interaction); and (d) 
T = 1.40, po = 0.3, z = 24 (without self-interaction). 

One can understand the quasi-ieguldiT phase by observ- 
ing the cooperative/defective clusters behavior. If defec- 
tors of a particular cluster receive a negative payoff at 
moment t, these players switch their status to coopera- 
tors at t+1. If this action is synchronized among different 
clusters, cooperation may emerge. Otherwise, if they are 



^ The Tragedy of Commons occurs when multiple individuals act 
independently, aiming only his own interest. When this action is 
done by multiple individuals simultaneously, it can destroy the 
advantage desired by all of them, for example, finishing the de- 
sired resources in the environment. However it is not the purpose 
of anyone. 



^ In systems that adopt the DES, the inclination of glider is de- 
termined by the direction of upgrade of players states. On the 
other hand, if the system adopts the PES, the glider can propa- 
gate both from left to right-hand side or vice-versa. 
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not synchronized and one cluster switches at the instant 
t and its neighbors at the instant t + 1 these clusters al- 
ternate between cooperation and defection, and there is 
a balance among cooperators that switch their states to 
defectors and vice-versa keeping the proportion of coop- 
erators almost constant, with small oscillations. 

In Figs. [HMl one sees some examples where the syn- 
chronization among clusters had not occurred, poo ~ 0.5 
and many players switch their states at each round, giv- 
ing rise to the quasi-i eguidii phase. The triangles appear 
when adjacent defectors switch their states to coopera- 
tion at the same time. In Fig. [Hal a transient followed by 
the quasi-ieguldiT phase with periodicity is presented. It 
also appears another triangular pattern, but its interior 
is not composed exclusively by cooperators or defectors, 
but by complex cooperative and defective fingers. This 
pattern is a triangle, with sides not well defined, which we 
call triangle-like, see Fig. [8al In Fig. [8bl one can notice 
that the cooperative clusters size are larger for system 
with higher connectivity, but the system phase remains 
quasi-ieguldiT. 



Player 

100 200 300 400 500 




(a) 

Player 

100 200 300 400 50 




(b) 



FIG. 8: Formation and evolution of the quasi-i eguidii 

system. The parameters of these simulations are: 
L 500, t 300 and (a) T 1.90, po 0.3, e z = 26 
(without self-interaction) and (b) T = 1.90, po = 0.5, 
and z = 29 (with self-interaction). Note the presence of 
a triangle-like at time t = 50 at players 200-250. 



In Fig. [9l one sees that few defective clusters are 
enough to drive the system to quasi-i eguidii phase in- 
stead of cooperative one, despite the high initial cooper- 
ators proportion, this occurs because T = 2. In Fig. [9al 
one can see a defective clusters zoom. Fig. [9b] illustrates 
a so-called triangle-like that emerges at t = 260 around 
player 160. 



Player 

100 200 300 400 500 




(a) (b) 



FIG. 9: Formation and evolution of the quasi-i eguiai 
system. The parameters of this simulation are: 
L = 500, t 300, T 2.00, po 0.9 and z = 6 (without 
self-interaction), (a) zoom of the region around to t = 1 
for the players close to the player 90; (b) zoom of the 
region: t = 260 around player 160. 

There are T intervals where increases or decreases in 
its value do not alter the system dynamics and poo for 
a same system (identical cooperators initial configura- 
tion and z). In Fig. [TOl for z = 24, when the sys- 
tem passes through the critical temptation values Tc = 
{1; 13/11; 7/5; 15/11; 8/5; 17/9}, transient changes 
and patterns increase in quantity and variety, these are 
the phase transitions. For instance, from T = 13/11 to 
T = 1.19 (see Figs. [TOb). appears some gliders in the 
initial steps and a complex finger emerges and propa- 
gate during all the system evolution. From T = 1.39 to 
T = 7/5 (see Figs. [TOH) the initial gliders are increased 
and they propagate during time evolution, not emerging 
the finger diS occured before. Furthermore, for 1 < T < 2, 
the system presents the cooperative phase in the steady 
regime and for T = 2 (see Fig. [10^), the system enters 
in the quasi-i eguidii phase. 

For systems adopting the PES, analyzing the clusters 
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(a) T = 1.00 
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(b) T = 1.01 
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(c) T = 1.19 



(d) T = 1.40 
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100. ■■ goo , -300 -400 50 
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(e) T = 1.41 



(f) T = 1.67 



300 



Player 
200 _3P0 



100 
I 150^ 
200 1 




(g) T = 2.00 



can yield a transient and after it achieves a periodic po 
oscillation. 



V. CONCLUSION 

The one-dimensional cellular automata, where each 
cell is a player who plays the Prisoner Dilemma with z 
neighbors, adopting the Pavlovian Evolutionary Strategy 
have been explored here. We have obtained the analyt- 
ical value of Tc. Using numerical results we have vali- 
dated the existence of the phases transition, that have 
been analytically calculated. We also analyzed the sta- 
tionary state of the system and explained the patterns 
of the clusters due the local interactions of players, given 
rise to a new phase, the quasi-i eguidii one. 

In short, our results are: (i) phases transitions occur 
in well defined values of temptation Tc that were defined 
analyticaly; (ii) existence of the cooperative and quasi- 
regular phases, which depend on the temptation value to 
defect; (iii) absence of the defective and chaotic phases; 

(iv) the Tragedy of the Commons does not take place; 

(v) the cooperation is more remarkable than in systems 
that adopt DES. 

The mean payoff of players is greater when the players 
are concerned only with their own payoff. If the players 
copy the action of the neighbor who received the largest 
payoff, they may worsen the outcome of the whole sys- 
tem. Thus, the comparison and therefore, the greed by 
the greatest payoff, can cause the ruin of all. In the sit- 
uation where there is no way to coordinate the moves 
of players, the best would that everyone seek to have a 
positive payoff, even this positive payoff is not the max- 
imum possible. Thus they could maximize the payoff of 
the population as a whole. 



FIG. 10: Sequence of numerical simulations that show 

how the temptation T change alter the 
cooperative/defective clusters patterns. In the interval 
between the presented T values do not occur changes in 
patterns. The parameters of these simulations are: 

L 500, t 300, Po 0.3 and z = 24 (without 
self-interaction). To: (a) T = 1.00; (b) T = 1.01; (c) 
T = 1.19; (d) T = 1.40; (e) T = 1.41; (f) T = 1.67; and 
(g) T = 2.00. 
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APPENDIX A: PATTERNS FORMATION 



patterns we see that the evolution depends strongly on 
the neighborhood composition of the cluster. However, 
the player location in this neighborhood (configuration) 
is irrelevant for the total payoff determination. We have 
noticed also that the transient and steady regimes depend 
on the system parameters, when T varies, the transient 
regime duration changes. In the steady regime, there are 
changes in the poo value when T passes through the Tc 
values. A system can present the cooperative or quasi- 
regular phases. In the quasi-ieguldiT phase, the system 



In the following we analyze particular cases to explain 
the dynamics of evolution of these systems. In the zoom 
of the images of Fig. [TTl one sees that the complex finger 
with three players have the pattern^: {D D D ^ D C 
D} and with thirteen players the pattern is: {D D D C 



^ C: cooper at or player, D: defector one. The player at center of 
the pattern is printed in boldface. 
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DDDCDDDCD^DCDDDCDDDCDD 

D}. The pattern of 13 players is a composition formed 
by the alternation of the patterns of three players, with 
overlap (see Fig.fTTb). In other words, the patterns {D D 
D} and {D C D} combine themselves so that the third 
player of one pattern is the first of the following one. 
Other combinations formed by the addition of patterns 
with or without overlapping of edges can be observed. 



Player 

100 200 300 400 500 




FTDTD] rPTDTDl IDIDIDI 
+ IDICIDI IDICIDI IDICIDI 

(b) 

FIG. 11: Formation and evolution of smooth and 
complex fingers. Parameters of this simulation are: 
L 500, t 300, T 2.00, po 0.3 and z = 3 (with 
self-interaction), (a) zoom of the marked area, (b) 
Formation of a pattern composed from elementary 
patterns. 



In the zoom of Fig. [12] there are simple fingers with 
twelve defectors at most and also a complex one with the 
pattern: {6D 3C 4D 3C 6D ^ 6D 3D 4C 3D 6D}. In 
the zooms of Fig. [131 emerging fingers have the pattern: 
Fig. [13^: {4D C 3D 3C D 4D ^ 4D D 3C 3D C 4D} 
and Fig. [ISd: {CD2CD2CDC^CDC3DCDC 
^ 3D 3C 3D ^ 2D C 3D C 2D ^ D C D 3C D C D 
^DC5DCD^CD5CDC}. Note the periodicity 
present in these patterns. 
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